Syntactic Sentence Fusion Techniques for Bengali
نویسندگان
چکیده
The present paper describes various syntactic sentence fusion techniques for Bengali language that belongs to the Indo-Aryan language family. Firstly a clause identification and classification system marks clause boundaries and classifies them as principle clause and subordinate clauses. A rule-based sentence classification system has been developed to categorize sentences as simple, complex and compound. The final syntactic sentence fusion system makes use of the sentence class and the clause types and finally fuses two textually entailed sentences using verb paradigm information and noun morphological information. The system outputs are compared with a gold standard data set using manual evaluation and BLEU techniques. The evaluation results yield good accuracy scores. The syntactic sentence fusion technique developed in the present work may be applied for other Indian languages. Keywords—Clause Identification and Classification, Sentence Type, Syntactic Sentence Fusion, Evaluation.
منابع مشابه
Finding Emotion Holder from Bengali Blog Texts---An Unsupervised Syntactic Approach
This paper presents two different approaches for identifying emotion holders from Bengali blog sentences. Two types of strategies yield average agreement measures of 0.78 and 0.80 for annotating emotion holders with respect to all emotion classes. The baseline model is developed based on the combinations of various part-of-speech (POS) features extracted from the phrase-based similarities. The ...
متن کاملBengali text summarization by sentence extraction
Text summarization is a process to produce an abstract or a summary by selecting significant portion of the information from one or more texts. In an automatic text summarization process, a text is given to the computer and the computer returns a shorter less redundant extract or abstract of the original text(s). Many techniques have been developed for summarizing English text(s). But, a very f...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملTopic-Based Bengali Opinion Summarization
In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present sys-tem follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse lev...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کامل